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1. Introduction : background and motivation 

Stein's method is a technique for obtaining bounds on a "distance" between an 
unknown probability distribution and a given target distribution. The method 
stems from two papers published in the 1970's by Charles Stein (concerning 
Gaussian approximation, see [33]) and Louis H. Chen (concerning Poisson ap- 
proximation, see [9]). Since those days a substantial body of work has been 
devoted to extensions of the method, the literature on the subject now being 
vast and varied. We refer the reader to the monographs [4], [2], [3] or [11]. 

The gist of the method can be summarized as follows. Suppose that, for 
a given target distribution g : X — > X dominated by a measure /i on some 
probability space (X,A, n), there exists a class of functions J-(g) C X* := {ip : 
X —> R, fi — measurable} and an operator T(-,g) ■ X* —> X* such that 

X ~ g(-) E[T (f,g)(X)] = for all / 6 T(g), (1.1) 

"This is an original survey paper 
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where by X ~ g(-) we mean P(X € ^4) = J A g(y)dfj,(y) for all measurable A C X . 
Now let Z ~ #(•) and suppose that we are interested in studying a random object 
whose distribution we do not know but which we believe to be approximately 
that of Z. After choosing a metric d n (W, Z) := sup he u |E[/i(W)] - E[/i(Z)]| for 
our approximation (where H is also a certain class of functions), the first step 
in the Stein method consists in writing, for all h £ H, 

d n (W,Z) = sup\E[T(f h ,g)(W)}\ (1.2) 
hen 

with fh the solution of the so-called Stein equation 

T(f h ,g)(x) = h(x)-E[h(Z)]. (1.3) 

The intuitive reason for which (1.2) is an interesting quantity to study is the 
following: Z satisfies the rhs of (1.1), thus if the law of W is close to that of Z, 
then (1.1) should be nearly satisfied and the rhs of (1.2) should be close to 
for all h such that fh € WflJfj). Hence |E[T(/^,g)(W)]| is an indicator of the 
H-distance between W and Z. 

The secret behind the method is that not only is the intuition outlined in the 
previous paragraph correct, but also, as it turns out, the rhs of (1.2) happens to 
be often "easier" to bound, making (1.2) a good starting point for a wide family 
of stochastic approximation problems. Determining equations of the form (1.1) 
for a given distribution g is the crucial starting of this method. For instance, 
Stein [33] showed that (1.1) holds for the Gaussian with T(f,g){x) = f'{x) — 
xf(x) and F{g) the class of differentiable functions on R; Chen [9] showed a 
similar relationship for the rate-A Poisson distribution, with T(f, g){x) = f(x + 
1) — Xf(x) and IF{g) the class of all bounded functions on Z. After identifying a 
suitable characterization, the usual methodology relies on three steps, namely (i) 
solving (6.1) for all h £ H, (ii) deriving bounds - the so-called magic factors - on 
the corresponding solutions, and (iii) applying the right tool (exchangeable pairs, 
zero- or size-biased distributions, truncation, etc.) in order to obtain explicit 
bounds on the rhs of (1.2) through the bounds obtained in step (ii). We refer 
the reader to the recent survey [31] for an overview. 

This method has been applied in a wide number of problems. While the bulk 
of the (now vast) literature on this subject is devoted to Poisson and normal 
approximation problems, there have also been extensions towards other non 
standard densities, particularly so in recent years. Gotze and Tikhomirov, for 
instance, use a characterization of the semi-circular law to obtain rates of conver- 
gence for spectra of random matrices with martingale structure (see [17]). Chat- 
terjee, Fulman and Rollin use two different characterizations of the exponential 
distribution to obtain general results for convergence towards an exponential 
distribution (sec [6]); they illustrate the applications of their methodology in a 
study of the spectrum of graphs with a non- normal limit. In [35], Stein, Diaconis, 
Holmes and Rcincrt obtain a characterization - by means of what is now called 
the density approach - of all regular distributions with a regular derivative (a 
function is regular if it is bounded and has at most countably many discontinu- 
ity points on its support, see also [34] and [12]); they use this in the analysis of 
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simulations. Eichclsbacher and Lowe [13] and Chattcrjcc and Shao [8] use [35] 's 
general characterizations to obtain general non-Gaussian approximation theo- 
rems, relevant for example in the field of statistical mechanics. Extensions to a 
multivariate setting are also available for the multivariate Gaussian law (see, for 
instance, [18], [7], or [30]). There exists a uniform treatment of the univariate 
discrete case, by means of Gibbs measures, which can be found in [14]. In [29] a 
characterization of the Kummer-J7 function is used to study degree asymptotics 
with rates for preferential attachment random graphs. Extensions of the method 
to continuous time processes are currently the object of active research (see [26], 
[28] or [27] and the references therein). This list is of course not exhaustive, and 
the method has also been used for binomial, negative binomial, multinomial, 
gamma, \ 2 an d many other target distributions (see [11]). 

In this paper we will not address Stein's method per se, but rather con- 
centrate on the characterizations (1.1) which are known, in the literature, as 
Stein characterizations. These have, so far, never been the subject of a specific 
treatment in the literature, and have always been introduced, through largely 
case-by-case arguments, as a means to an end rather than as an object of in- 
trinsic interest. This is perhaps explained by the nature of the different target 
distributions (discrete, continuous, bounded or unbounded support, etc.) which 
make it complicated to try to unify all these characterizations under a single um- 
brella (in general different choices of target distributions will require imposing 
different combinations of restrictive assumptions) . The purpose of this article is 
to exploit the similarities between all the characterizations discussed above in 
order to show how all these results can be seen as different instances of a unique 
phenomenon. 

As it turns out, not only does our approach allow for (re-)obtaining the 
characterizations mentioned above (as well as many others) but it also simplifies 
the resulting proofs and allows to identify clearly the minimal conditions on the 
target densities under which such characterizations hold. More importantly it 
opens new lines of research, and builds hitherto unsuspected bridges between 
Stein's method and information theory 

The outline of the paper is the following. In Section 2 we discuss different 
Stein characterizations and use this discussion to provide the heuristic behind 
our approach. This heuristic is formalized in Section 3 where we also prove our 
main result, Theorem 3.1, which provides a general and simple characteriza- 
tion theorem for a very broad family of (discrete and continuous, univariate 
and multivariate) distributions. In Section 4 we illustrate the consequences of 
Theorem 3.1 by providing characterizations for important classes of parametric 
distributions, namely the location and the scale families, as well as a character- 
ization for discrete distributions. We apply our findings in a number of illustra- 
tive examples in Section 5, and uncover a couple of unpublished (to the best 
of our knowledge) characterizations. In Section 6, we discuss a couple poten- 
tial applications of our results. Finally, Appendix A collects the more technical 
proofs. 
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2. Stein characterizations 

In this section we provide the heuristic behind our approach. The arguments 
are constructive: starting from the Gaussian distribution we generalize so as to 
obtain the weakest possible assumptions for the most general result. 

2.1. The density approach 

Let stand for the standard Gaussian density. In a seminal paper [33] , Charles 
Stein introduced the relationship 

X ~ AA(0, 1) E[f'(X) - Xf(X)} = for all / G T(cj>), (2.1) 

with T(4>) the collection of all diffcrcntiablc real functions for which the expec- 
tation in (2.1) is defined. There exist many proofs of (2.1) (see, e.g., [19], [10] 
or [26]). We opt to present a different argument which enjoys the advantage of 
being transferable to virtually any continuous target density. 

First remark that —x = </>' (x) / <fi(x) so that equation (2.1) can be cquivalcntly 
rewritten 

h(x) oc 6(x) ^=> I [ fix) + ^lf(x)) h(x)dx = for all / G F{6), 
Jr V 9K X ) J 

A , h(x)dx = for all / e T[6\ (2.2) 
<P{x) 

where h : M — > R + is some density. The sufficient condition in (2.2) is immediate 
via integration by parts (the implicit boundary conditions on / £ F((j>) ensuring 
that the constant term vanishes). To prove the necessity, choose for A C M. a 
test function Ja G J~(<p) that satisfies the differential equation 

(MM=i A (x)- / m** (2-3) 

for 1a the indicator of A. If such a /a exists and if it belongs to F{4>), then (2.2) 
guarantees that 



h(x)dx = / (f)(x)dx for all A C 

A J A 



and thus h = <fi. Of course (2.3) is easily solved explicitly, yielding the candidate 
solution 

2.4) 



a function which, for all A C K, is readily shown to satisfy all the requirements 
for belonging to .F(</>) (see, e.g., [10]). Hence the result holds. Now note that 
the above argument relies nowhere on specific properties of the Gaussian <f>, but 
rather only on boundary and integrability conditions implicit in the definition 
of J- ((f)) and in the solution (2.4). It therefore suffices to replace <f) by some 
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generic density g in all the above arguments and to work out conditions on 
g so that everything runs smoothly in order to deduce, from (2.1), a general 
characterization theorem for continuous distributions. 

To the best of our knowledge such a general characterization result was pre- 
sented for the first time in [35] under a slightly different form; the only earlier 
similar attempt we have found is provided in [32] where a construction of Stein 
operators for Pearson and Ord families of distributions is provided. Stein's [35] 
result is now known in the literature as the density approach and allows for 
recovering the Gaussian characterization (2.1), the exponential characterization 
from [6] or the following two examples (which are also provided in [35]). 

Example 2.1. Let — oo < a < b < oo. A random variable Z is U[a,b] if and 
only if"E[f'(Z)] = f(b~) — /(o ) for all differ entiable functions f. 

Example 2.2. Let A > 0. A random variable Z is Exp(X) if and only if 
E[f'(Z) - Xf(Z)] = -A/(0+) for all differ entiable functions f. 

The denomination density approach is to be considered in analogy with the 
generator approach due to Barbour [1] and Gotze [16]. 

2.2. Location-based and scale-based characterizations 

There exist a number of outstanding characterizations which cannot be writ- 
ten in the form (2.2) such as e.g. those for discrete distributions or for the 
semi-circular distribution. For instance, in [6], a version of Stein's method for 
exponential approximation is developed, the arguments relying on two charac- 
terizations of the exponential distribution. The first, provided in Example 2.2, 
is an instance of the density approach. The second is given by 



for all / G J-2{Exp(l)) a "sufficiently large class" of functions. This characteriza- 
tion is clearly not a consequence of the density approach. We nevertheless claim 
that (2.5) stems from the same origin as the characterization in Example 2.2; to 
see this it is necessary to re-interpret these results in terms of concepts inherited 
from a statistical point of view. 

First recall how (2.2) was deduced by replacing the linear term x in (2.1) with 
the ratio —g'/g. This ratio is a familiar object in statistics: it is the score function 
ip(x — Mo) = (dfi.g{x — t J -)\ IJ ,=u )/9( x ~ Mo)j evaluated at (j,q = 0, associated with 
the location parameter /j of a location family g{x — u) of distributions. Here 
stands for the derivative in the sense of distributions w.r.t. /i. With this 
parametric notation in hand, the characterization can be rewritten as 



for all / G J r (.g;/io)(D F(g)) a sufficiently large class of functions depending 
on both g and Mo- This shows how the density approach can be seen as a spe- 
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cial instance (for /xo = 0) of what we will henceforth call the location-based 
characterization (2.6). 

Next reconsider equation (2.5). For a given Exp{o~) distribution, the param- 
eter tr is generally interpreted as a scale parameter. Writing out the argument 
of the expectation in the rhs of (2.6) in terms of a scale parameter of a scale 
family ag(ax) of distributions leads to {d a denotes the derivative in the sense 
of distributions w.r.t. a) 



For g the density of an exponential distribution and uq = 1, the latter equality 
corresponds to xf'(x) + f(x)(l — x), which is the argument of the expectation 
in (2.5) (note that for an exponential distribution, the support does not depend 
on the scale parameter, hence no indicator function needs to be differentiated). 
Thus, the second characterization of the Exp(l) given in [6] can be viewed as a 
special instance (for erg = 1) of what we will call a scale-based characterization 
which, in its most general form, reads 



for all / € J~(g; oo) a sufficiently large class of functions depending on both g 
and (To. 

The location- and scale-based characterizations provided above do not, how- 
ever, cover Chen's characterization of the Poisson distribution, to cite but this 
well-known example. Moreover, upon further thought, there is no intuitive jus- 
tification which would explain why only location and scale parameters should 
play a special role; the tail parameter of a Student distribution or the upper 
and lower bounds of a uniform distribution over some interval [a, b] should also 
be allowed to play a crucial role in such characterizations, as well as, e.g., the 
parameter A of the Poisson distribution. As it turns out, there exists a much 
neater and efficient general framework in which both the above "general" results 
turn out to be straightforward particular cases. 

2.3. A general characterization result 

In this section we fix, for simplicity, /iq = and oq — 1. Naively exploiting 
the similarities between (2.6) and (2.7) encourages us to propose the following 
general conjecture. 

Conjecture 1. Let g(x\ 9) be a parametric family of densities with parameter 9. 
Suppose that g(x; 9) satisfies a number of regularity conditions. Fix a value 9q 
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of 9 and denote by dg the derivative in the sense of distributions w.r.t. 9. Then 



for all f S !F{g]6o) a sufficiently large class of functions depending on both g 
and 8 . 

This Conjecture, if true, would enjoy several advantages: all kinds of param- 
eters 8 could appear, and no difference would be made between the continuous 
and the discrete case. While promising, the main drawback of (2.8) consists in 
the fact that the conditions on the target density g, as well as the structure 
of the family of test functions F(g; 8 ) under which the Conjecture holds true, 
remain mysterious. In order to clarify this issue, one final argument needs to be 
invoked, the origin of which lies, once again, in a statistical approach to such 
identities. 

Let g(x; 8) be as in the Conjecture above. A classical result in likelihood 
theory states that, under regularity conditions, the expectation of the score 
function dgg(x;9)/g(x;9) vanishes. The proof is very simple. Let (X, mx) be 
a measure space (e.g., R equipped with the Lebesgue measure or Z equipped 
with the counting measure). Since J x g(x;8)dmx(x) = 1, differentiating w.r.t. 
8 on both sides yields J x dgg(x; 9)dmx{x) = 0, provided that the derivative and 
the integral are interchangeable. This immediately shows that the expectation 
under g(x;8) of dgg(x;8)/g(x;9) equals zero. Now, under g(-;0o), the rhs of 
(2.8) corresponds to J x dg(f (x; 8)g(x; 8)) \ g=g dmx(x) = 0, which, under the 
condition of interchangeability of derivatives w.r.t. 8 and integration w.r.t. x, 
can be rewritten as dg(J x f(x;8)g(x;8)dmx{x)) \ e= g = 0. Thus, by analogy 
with the proof of the likelihood-based result, we see that, in order to belong to 
the class .F(<7;£>o), a test function / should satisfy the following natural three 
conditions in some neighborhood Go of 9q: 

(i) there exists a real constant c/ such that J x f(x] 9)g(x] 9)dmx{x) = Cf for 
all 9 e 6 ; 

(ii) the mapping 8 i— > f(x; 8)g(x; 8) is differentiable over 0o; 

(iii) the differentiation w.r.t. 8 and the integral sign are interchangeable for all 



These conditions will be made more precise in Definition 3.1 of the next section. 
As we shall see, the first of these conditions yields the form of the candidate 
functions f(x; 8) (for instance x M- f(x — 9) in the location case and x >-> f(9x) 
in the scale case) and the second and third explain the sometimes complicated 
conditions imposed on the test functions in the relevant literature. 

As a conclusion we stress an important fact: nowhere in the above argument 
do we rely on the target density to be continuous. As we will show in the follow- 
ing section, the heuristic outlined above holds irrespective of the nature of the 
target density, and (2.8) carries, as particular instances, the known character- 
izations for the Gaussian, the uniform, the exponential, the semi-circular, the 
Poisson and the geometric, to cite but these. 
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3. Characterizations in terms of a parameter of interest 

In this section we present the main result of this paper, Theorem 3.1, which 
provides a unified framework for constructing Stein characterizations - by means 
of a characterizing class of test functions and a characterizing operator - for 
univariate, multivariate, discrete and continuous distributions. As announced in 
the previous section, we show that all these results allow for an interpretation 
in terms of a parameter of interest of the target distribution. 

3.1. Notations and definitions 

We first need to clearly identify the notations and vocabulary which will be 
used from now on. Throughout, we let k,p S No and consider the two measure 
spaces (X,Bx,mx) and (6,i3e, TO e): where X is either R k or Z fe , where 6 is 
a subset of W whose interior is non-empty, where mx is either the Lebesgue 
measure or the counting measure, depending on the nature of X, where me is 
the Lebesgue measure, and where Bx and Be are the corresponding cr-algebras. 
In this setup wc disregard the case of discrete parameter spaces (as in, e.g., the 
discrete uniform); such distributions are shortly addressed in Remark 3.4 at the 
end of the current section. 

Consider a couple (X, 0) equipped with the corresponding cr-algebras and 
measures. We say that the measurable function g : X x 9 — > K + forms a family 
of 9-parametric densities, denoted by g{-;9), if f x g(x;9)dmx(x) = 1 for all 
9 € O. In this case we call 9 the parameter of interest for g. When X = M. k , 
corresponding to the absolutely continuous case, the mapping x t— > g(x; 9) is, 
for all 9 E 0, a probability density function evaluated at the point x £ M fe . 
When X = Z fe , corresponding to the discrete case, g(x;9) is the probability 
mass associated with x G Z k and g(-;9) therefore maps Z k onto [0,1]. This 
unified terminology will allow us to treat absolutely continuous and discrete 
distributions in one common framework. For the sake of simplicity, we rule out 
mixed distributions. 

Example 3.1. 9-parametric densities are ubiquitous in probability and statis- 
tics. Taking j:Zxl+4 [0,1] : (x, A) H> e" A A 2; /x!I N (x), where Ia(') stands 
for the indicator function of some set A € Bx, we see the density of a Pois- 
son V(X) distribution as a X-parametric density. Taking g : M X (R X R(|) — > 
R+ : (x 7 (fi,o-y) H> (2Trcr 2 )~ 1 / 2 e~ < - x ~^ ^ 2(J \ we see the density of a Gaussian 
A/"(/i,a) distribution as a (/x, a) -parametric density. If, in the Gaussian case, the 
scale is known (and set to o~o), one is then only interested in the location pa- 
rameter /J,. Taking g : K x M. — > R + : (x, /x) i— > g(x; /x) = g(x; (fi, o~q)'), we see the 
density of a Gaussian Af(/J,, Co) distribution as a ^-parametric density. Likewise, 
one can see the density of a uniform U[a,b] distribution as an (a, b) -parametric 
density, an a-parametric density or a b-parametric density. In general, there are 
infinitely many ways to write the density of any given probability distribution 
as a 9-parametric density for any given 9. See for instance, on this issue, the 
discussion on the so-called natural parameters of the exponential family in [24]- 
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Fix a couple {X , 0) as above, endowed with their respective cr-algebras and 
measures. Throughout this paper, the densities we shall work with all belong 
to the class Q := Q(X,Q) of f-parametric densities for which the mapping 
^ is differentiable in the sense of distributions. Such distributions may 

have a bounded support possibly depending on the parameter 9; we will denote 
this support by Sg := Sg(g), be it dependent on 9 or not. 

With this in hand, we are ready to define the two fundamental concepts of 
this paper. These (a class of functions and an operator) mirror notions already 
present in the literature on Stein characterizations. 

Definition 3.1. Let 9q be an interior point of & and let g € Q. We define the 
class !F(g]6o) as the collection of test functions / : X x — > R such that the 
following three conditions are satisfied in some neighborhood 6o C 6 of 9q. 

Condition (i) : there exists Cf £ R such that J x f(x;9)g(x;9)dmx{x) = Cf for 
all 9 e 9 . 

Condition (ii) : the mapping 9 t— > f(-;9)g(-;9) is differentiable in the sense of 
distributions over &$. 

Condition (Hi) : there exist p mx -integrable functions hi : X — > R + , i = 1, . . . ,p, 
such that \dg i (f(x;9)g(x;0))\ < hi(x) over X for alii = l,...,p and for all 



The three conditions in Definition 3.1 are to be compared to the three con- 
ditions discussed at the end of Section 2.3. 

Definition 3.2. Let 9q be an interior point of Q. Also let g and T{g] 9q) be as 
above. We define the Stein operator Te a '■= Tg (-,g) ■ !F(g;8o) — > X* as 



The operator defined by (3.1), inspired by the rhs of (2.8), requires some 
comments. If the support of g(-; 9) is X itself, then the operator is obviously well- 
defined everywhere. If, on the contrary, the density g(-',9) has support Sg C X, 
then there is an ambiguity which we need to avoid. To this end we adopt the 
convention that, whenever an expression involves the division by an indicator 
function I4 for some A € Bx, we are, in fact, multiplying the expression by the 
said indicator function. With this convention, writing out the operator in full 
(whenever the gradient Ve(/(x; 9))\ g=g is well-defined on X) reads 



Our convention not only guarantees that the Stein operator is well-defined but 
also that, for any test function /, the support of % (f, g)(x) is included in 
Sg a . This convention was implicit throughout the discussion in the heuristic 
section. As already mentioned there, the usage of derivatives in the sense of 
distributions of g with respect to 9 implies also taking derivatives of indicator 
functions whenever Sg depends on 9. 
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Example 3.2. (i) Let X = R, = R and g(x;ji) = (l^^^er^-^ I 2 , the 
density of a univariate normal 7V(//, 1) distribution. Clearly, g belongs to Q for 
all fi £ R and its support = R is independent of (i. Fix (iq = and consider 
functions of the form f : R x R — > R : (x, /x) i-> f(x;fj,) := /o(a; — /it), where 
/o : M — > K is chosen such that f £ J-"(<?;0). Restricting the operator 7o io £/ie 
collection of f 's of this form, it becomes 

To(f,g)(x) = -f{,(x) + xfo(x). 

(ii) Let X = Z, © = and 3(2;; A) = e~ X x /x\ In(^), density of a Poisson 
■p(A) distribution. Clearly, g belongs to Q for all A £ Rq and its support S\ = N 
is independent of X. Fix A = Ao and consider functions of the form f : ZxRj — > 
E : (x, A) i-> /(x;A) := e A [A/ (a; + l)/(x + 1) - /oO)], w/iere / : Z -> K is 
chosen such that f £ F{g] Ao). Restricting the operator T\ to the collection of 
f 's of this form, it becomes 

Tx (f,g)(x) = e x ° (f (x + 1) - ^/oWj Mx). 

Among densities g £ G, those which satisfy the following (local) regularity 
assumption at a given interior point 9q £ will play a particular role. 

Assumption A : there exists a rectangular bounded neighborhood 0o C of #0 
and a m^-intcgrable function h : X — s- R + such that <?(a;; 6*) < h(x) over A" for 
all 9 £ 6 . 

This assumption is weak, and is satisfied for example as soon as the target 
density is bounded over its support. It does, nevertheless, exclude some well- 
known distributions such as, e.g., the arcsine distribution. 

3.2. Main result 

With these notations, we are ready to state and prove our general characteriza- 
tion theorem. 

Theorem 3.1. Let g £ Q, let Zg be distributed according to g{-;9), and let X 
be a random vector taking values on X . Fix an interior point 9q £ 0. Then the 
following two assertions hold. 

(1) IfX = Z 0O , then E[Te (f,g)(X)} = for all f £ F(g;9 Q ). 

(2) If g also satisfies Assumption A at 9q and i/E[7e (/, g)(X)] = for all 
f £ T{g;9 ), then 

X\X£Sg = Zg . (3.2) 

The first statement in Theorem 3.1 is standard; it implies that in order to 
obtain a Stein operator for a given ^-parametric density g at a point 9q, it suffices 
to find a collection of functions / such that the conditions in Definition 3.1 
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hold and then apply the operator given in Definition 3.2. As we will show in 
the next section, this allows for recovering many well-known Stein operators, 
and for constructing many more. The second statement is also quite standard 
whenever Sg a = X. If Sg a C X, then things are slightly more tricky. Indeed, in 
this case, equation (3.2) does not imply that the law of X is necessarily that of 
Zg , but rather that if the distribution of X has support Sg and if X satisfies 
E[Te {f,g)(X)] = (on S 8a by definition of Te {f,g)) for all / G T(g;9 ), then 
X is distributed according to g(-;9o). This is in accordance with all other results 
of this form. 

Proof. (1) Since Condition (iii) allows for differentiating w.r.t. 9 under the in- 
tegral in Condition (i) and since differentiating w.r.t. 9 is allowed thanks to 
Condition (ii), the claim follows immediately. 

(2) First suppose that p = 1, and fix ©o C 0, a bounded (rectangular) 
neighborhood of 9q on which g satisfies Assumption A at 9 . Define, for A G Bx, 
the mapping 

f A :X xO ^R:(x,9)^ — ^— [ l A (x;u,6)g(x;u)dm e (u) (3.3) 

9{x;6) Je 

with 

l A (x-u,9) := (l A {x)-P{Z u eA\Z u e S e ))Is,(x), 

where 

~P(Z U e B) = / I B (x)g(x; u)dm x {x) 
Jx 

for B G Bx- Note that, for the event [Z u e S$] to have a non-zero probability, it 
is crucial to work in a neighborhood 0o rather than in 0; clearly, this event is 
always true when Se does not depend on 9. To see that f A belongs to F(g\ 6*o), 
first note that 



f A (x;9)g(x;9)dm x (x) = l A (x;u,9)g(x;u)dm e (u)dmx{x) 

X JX J 9 

l A (x; u, 9)g(x; u)dmx(x)dmQ(u), 



O JX 

where the last equality follows from Fubini's theorem, which can be applied for 
all 9 £ ©o, since in this case there exists a constant M such that 

\l A (x;u,6)\g(x;u)dm x (x)dme(u) < 2\0 - 9 \ < M 

I0 O JX 

for all 9 € ©o- We also have, by definition of l A , 
l A (x; u, 9)g(x; u)dmx{x) 

x 

= P(Z U g A n S e ) - P (Z u 6 A | Z u g S e ) P(Z U G Se) 
= 0. 
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Hence Ja satisfies Condition (i). Condition (ii) is easily checked. Regarding 
Condition (iii), one sees that 

d t (f A (x; t)g(x; t))\ t=e = l A (x; G, G)g(x; 9) + H(x; G), (3.4) 

with H(x; 9) a function whose complete expression is provided in the Appendix. 
As shown there, it is easy to bound H(x;G) uniformly in G over 0o by a mx- 
integrable function. Moreover, Assumption A guarantees that the same holds 
for lA(x;9,9)g(x;9). Hence /a satisfies Condition (iii). Wrapping up, we have 
thus proved that Ja G F(g;9o). The conclusion follows, since H(x;Gq) = for 
all x G X (see the Appendix) and since, by hypothesis, 

E[T 8o (fA,9)(X)} = E[I AnSeo (A) - P(Z eo G A)l Seo (X)} = 0. 

Next suppose that p > 1. Let 9q := (0q, ■ ■ ■ , 9%) and fix 9 := 6j x . . . x 
a bounded (rectangular) neighborhood of Gq on which g satisfies Assumption A 
at #o- Define, for all j = 1, . . . ,p and for all A G Bx, the mappings 

% : e J -> 6o : u H- (0j, . . . , 9t\u, tf +1 , ...,9 P ) 

and 

f r 93 

f A :X xe„^l:(i,^— - / iiCaju^^CajflgCu^dmeifu), 

Jog 

with 

i^fou,**) := (l A (x) - P (Zi G A\ Z> u G ^ (9J) )) ^ ((J) (x), 

where 

P(Z£ G S) := / I fl (aOff(x;^(u))dTO*(aO 

for _B G The p-variate equivalent of the function f A in (3.3) is given by 
f A P \x; 9) := Y^j=i Along the same lines as for the special case p = 1, 

Conditions (i)-(iii) arc now easily seen to be satisfied by f A p ^ (wc draw the 
reader's attention to the fact that the rectangular nature of the neighborhood 9o 
is important in order to ensure Condition (iii)). The result readily follows. □ 

Remark 3.1. Nowhere in the proof did we need to specify whether the random 
vector X is univariate (for k — 1 ) or multivariate (for k > 1). 

Remark 3.2. When p > 1, the (vectorial) operator 7e (/, g) contains, in a 
sense, p different characterizations of the 9 = (G 1 , . . . ,G P )- parametric density g 
at 9q. The requirements (in this formulation of the result) on the test functions 
f are, perhaps, unnecessarily stringent. Indeed, setting 6^ := (G l1 , . . . ,G lq ) for 
1 < i\ < . . . < i q < p, we can obviously consider g as a G^ -parametric den- 
sity. The corresponding q-dimensional sub-vector of Te (f, g) also gives rise to 
a (vectorial) Stein operator for which the conclusions of Theorem 3.1 also hold 
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at 9q, this time with a possibly larger class of test functions f (thanks to the 
weakening of the requirements imposed by Condition (in)). In particular, taking 
q = 1, we obtain p distinct one- dimensional characterizations of g at 0q. This 
might be very helpful in approximation theorems concerning g. 

Remark 3.3. Note that both implications in Theorem 3.1 are obtained at fixed 
6o € O. We attract the reader's attention to the fact that all our calculations 
and manipulations, as well as all the conditions on the functions at play, are 
consequently local around 9q . 

Remark 3.4. All the definitions and arguments above can be extended to en- 
compass distributions with a discrete parameter space (such as, e.g., the dis- 
crete uniform). For this it suffices, in a sense, to replace the derivatives and 
integrals by forward (or backward) differences and summations, respectively. Al- 
though it is easy to obtain Stein operators by this means, determining the exact 
conditions under which the theorem holds nevertheless requires some care, since 
in this case there arise problems which originate in the interplay between the 
support of the target density and the parameter of interest. Because of these 
(structural) intricacies, working out explicit conditions on the target density in 
this framework appears to be a rather sterile exercise, which is perhaps better 
suited to ad hoc case by case arguments. This issue will no longer be addressed 
within the present paper. 

The first statement of Theorem 3.1 can be seen as a user- friendly Stein 
operator-producing mechanism, since any subclass ^(g^o) C J-(g;0o) yields 
a left-right implication, i.e. an implication of the form 

X ~ <?(•; O ) => E [Tg (f, g)(X)] = for all / e P(g; ). 

This raises some important questions. Indeed, consider for instance the two op- 
erators provided in Example 3.2. As it turns out, both these operators have 
proven to be extremely useful in applications and their properties are funda- 
mental in the history of the Stein method. However, as already noted by a 
number of authors before us, they are by no means the only such operators for 
the Gaussian or the Poisson distribution; in our framework they are just two 
particular instances of equation (3.1) restricted to certain very specific forms 
of test functions. A natural question is therefore that of whether there exist 
other subclasses of test functions for which the corresponding operators would 
also be useful in applications. It is possible that this question does not allow 
for a fully satisfactory answer. More precisely it is possible that, for any given 
problem, there is no a priori reason why a given operator would yield better 
rates of convergence than any other, and perhaps in each problem a careful 
combination of different characterizations (a la Chatterjee, Fulman and Rollin 
[6]) would be fruitful and would allow for obtaining better results than those 
obtained by focusing on a single characterization alone. 

In any case it seems intuitively clear that, in order for a subclass and the 
corresponding operator to be of practical use, they need to characterize the law 
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under consideration, that is, we should have the relationship 

X ~ g(-;0 o ) E [T tfo (/, $)(*)] = for all / e -F(g;0 o ), 

where the right-left implication is to be understood in the sense of (3.2) in 
case Sg is a strict subset of X. Constructing such subclasses, which we call 
9 -characterizing for g at 9q, is relatively easy. Indeed it suffices to adjoin the 
function /a defined in (3.3) to any collection (even empty) of test functions 
which satisfy the three conditions in Definition 3.1. Such an approach is, how- 
ever, of limited interest and, moreover, does not allow for clearly identifying the 
form of the corresponding operators. We therefore suggest a more constructive 
approach, which we describe in detail in the next section. 

4. Characterizing probability distributions 

In this section we provide a general "recipe" which allows for constructing 9- 
charactcrizing subclasses with well-identified operators. We apply our method to 
build general characterizations for location families, scale families and discrete 
distributions. Many well-known Stein characterizations fall under the umbrella 
of these results. We also show how our method can be applied to obtain more 
unusual characterizations. 



Characterizations under an exchangeability condition 

For the sake of simplicity, we let k = p = 1. Fix 9q £ ® and choose g € Q which 
satisfies Assumption A at 9q. In order to construct a ^-characterizing subclass 
F(g; Oo) C F(g; 9 ), we suggest the following method. 

Step 1: Consider Condition (i) in Definition 3.1, which requires that we have 

f(x;9)g(x;9)dm x {x) = c f 

x 

for c/ 6 R. In many cases, the interaction between the variable x and the param- 
eter 9 within the density g allows to determine a favored family of test functions 
fo(x;9) which satisfy this condition. Moreover, these functions are usually ex- 
pressible as f (x; 9) = f(/ ; 6)(x), with / e X* and f : X* x 8 -> (X x 9)*. 

Step 2: For T and fo as given in Step 1, define the exchanging operator T : 
X* x — > (X x 0)* as a transformation which satisfies the exchangeability 
condition 

d e (f(f Q -9)(x)g(x-e)) = d v {T(f Q -6 ){y) g{y,0 Q ))\ (4.1) 

over X, where d y either means the derivative in the sense of distributions or the 
discrete (forward or backward) difference, and we hereby implicitly require that 
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T is such that the derivative on the rhs of (4.1) is wcll-dcfincd over X. 

Step 3: Define the class J-q := Fo{g] 9q) as the collection of all functions /o G X* 
such that T(f o ;0) G F{g;0$). Note that we therefore have the (new) left-right 
implication 



E 



d y (T(fo;6 Q )(y)g(y,e ))\ 



y=X 



g(X;0 o ) 



= for all / eJo- 



Step 4-' Solve the Stein equation 

d y (T(/ A ; o )(y) g(y; ))\ y=x = l A (x; O , e )g(x; 9 ) (4.2) 

where Ia(x;0q,0o) is as in the proof of Theorem 3.1. If T(-;0o) is invertible, it 
then suffices to check whether the corresponding /q belongs to J-q in order to 
obtain the characterization 

= for all /o G J^, 

where the right-left implication is to be understood, as before, in the sense of 
(3.2) in case the support Se of <?(•; 0q) is a strict subset of X. 

The resulting ^-characterizing subclass J-(g\ 0$) is none other than the collec- 
tion {T(/o; 6) | /o € J-"o} U {/a}', this collection not only has the desired proper- 
ties, but also is accompanied with a well-identified Stein operator. In the sequel 
it will be more convenient to state our results in terms of rather than in 
terms of T(g; 0o). This is in accordance with all other results of this form. 

There are a number of ways in which one can extend the method presented 
above to the cases k > 1 and p > 1. Also, for given (9o and ^-parametric density 
g, the choice of class and exchanging operator T is not unique. Moreover, de- 
termining straightforward minimal conditions on the /o for the characterization 
to hold seems to be impossible without making further regularity assumptions 
on the target density g. These considerations entail that it is perhaps more 
fruitful to tackle different 0-paramctric densities with ad hoc arguments. There 
are nevertheless important instances in which one can obtain general results 
with relative ease. To this end consider the following assumption on univariate 
0-parametric densities. 



E 



dy(T(fo;9o)(y)g(y;9 ))\ y=x 
g(X;0 o ) 



Assumption B : there exists Xq G X such that 

I (yj lA{y]O Ol o )g{y]0 o )dm x (y) S jls go {x)dm x {x) • 
for all A G Ex-, where ^(y; #o, #o) is defined as in the proof of Theorem 3.1. 



imsart-generic ver. 2009/08/13 file: Ley_Swan_2011a.tex date: January 25, 2013 



C. Ley and Y. Swan. /Stein characterizations 



16 



This is a condition on the tails of the density g(-;9o) which is, for instance, 
satisfied by the Gaussian and the exponential distributions (while the latter is 
evident, see for the former [10] page 4). As we will see, Assumption B is useful 
for determining general characterization results in location and scale models. 



4-2. Location- based characterizations 

In this subsection we apply the method described in Section 4.1 to study laws 
whose parameter of interest is a location parameter. 

Corollary 4.1. Let k = p = 1 and X = M = 0, and fix uq E 0. Define Qi oc 
as the collection of densities go : X — > R + with support S C X such that the 
[i-parametric density g(x; /it) = go(x — fi) belongs to Q and satisfies Assumptions 
A and B at Uq. Let ©o C be as in Assumption A, and define To := Toigo', Mo) 
as the collection of all fo : X — > K such that 

Condition (fi-i) : \ J X fo(x)go(x)dmx(x) | < oo, 

Condition (fi-ii) : the mapping x i— > fo(x)go(x) is differentiable in the sense of 
distributions over X , 

Condition ([i-iii) : there exists a mx-integrable function h : X — > R + such that 
d v (fo{y - y")ffo(y - V>))\ y=x < K x ) over x f° r al1 A f G @o- 
Then To is ^-characterizing for go at Ho, with ^-characterizing operator 

% a /o,3o) : X ^ X : x ^ v — . (4.3) 

go{x- Mo) 

A proof, which is a direct application of the method described in Section 4.1, 
is provided in the Appendix. The operator in (4.3) - as well as the conditions 
on the densities and the conditions on the test functions fo - differ slightly from 
those already available in the literature; this matter has already been discussed 
in Section 2. 

Corollary 4.1 contains a number of well-known univariate characterizations 
covered in the literature. For instance, taking g(-',fJ.) to be the density of a 
A^(m, 1) (which satisfies Assumptions A and B at no = 0) wc can use the op- 
erator provided in Example 3.2; Corollary 4.1 then leads to the famous Stein 
characterization of the standard normal distribution. Likewise, introducing an 
artificial location parameter fi within the exponential density with scale pa- 
rameter 1 (which, again, satisfies Assumptions A and B at fxo — 0) leads to 
the characterization of the exponential distribution given in Example 2.2. More 
generally, when g belongs to the (continuous) exponential family (see [19]), one 
easily sees how the same manipulations allow to retrieve the known characteriza- 
tions (see also [20] or [24]). We refer to [35], [12] and [32] for more location-based 
characterizations. 

Next consider the semi-circular law whose density is given by 

2 



g (x-n) = — g V^ 2 - (z - M) 2 \-a.a](x - fi), (4.4) 
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with /j, G K being a location and a G Kq a known scale parameter. In the special 
case /x = and a = 2, Gotze and Tikhomirov [17] prove that a random variable 
X is distributed according to (4.4) if and only if 



E[(4-X 2 )f'(X)-3Xf(X)} = 



(4.5) 



for all test functions / in a certain class of functions. We claim that (4.5) falls 
within the category of location-based characterizations. To see this it suffices 
to note that, although we are in a location model with target density satisfying 
Assumptions A and B at all points Ho G R, the derivative g'o(x — \i) is not 
bounded at the edges of the support. Conditions (/i-h) and (/x-iii) therefore 
entail some stringent requirements on the admissible class of test functions. In 
order to be able to read these requirements more easily, one way to proceed is 



to consider only Jo's of the form fo(x) = fi(x)(a 2 



", with r > 1/2. Writing 



out the location-based characterization in terms of the functions /i instead of 
/o yields, for r — 1, the expression in (4.5); sufficient conditions on f\ for fo to 
belong to are easy to provide (see [17] in the case r = 1 and a = 2). 

Note that, when the target density belongs to Pearson's family of distribu- 
tions, there exists a general result due to [32] for obtaining Stein characteri- 
zations which encompasses many of the characterizations obtainable through 
Stein's density approach. We wish to stress the fact that all these results can be 
recovered through our Corollary 4.1. 

And now a multivariate example. Consider a random fc-vector Z^ with Ho G 
M. k and density of the form g(x; /x) := go(x-fi) = go(xx—fj, x , £2— /x 2 , . . . , Xk — H k )- 
Suppose, for the sake of simplicity, that the support of go(x — fi) does not 
depend on \i (i.e. S = X = M fc ). One way to characterize such distributions 
at /xq is to define, for fixed X2, ■ ■ ■ ,Xk, the univariate ^-parametric density 
gi(xx', /Xi) = go(xi — fi 1 , X2~ Hq, ■ ■ ■ , Xk — /Xq)- Requiring that g\ G Q and satisfies 
Assumptions A and B at /xj, we easily determine a class of functions J-q as in 
Corollary 4.1 to obtain 



X = Z„ 



E 



9y (fo(y- Vo,X)g (y- Ho,X))\ 



V=Xi 



g (X - no) 



for all fo G J-q, 



(4.6) 



where we use the abuse of notations (y — /Xo, X) = (y — /xj, Xi — /Xq, . . . , Xk — Ho) 
and X — ho = {X\ — Ho, X2 — /x 2 ,, • • • ,Xk — Ho)- The choice of /x 1 as parameter 
of interest was of course for convenience only, and similar relationships hold for 
derivatives with respect to X2, ■ ■ ■ ,Xk as well. Moreover, when Zg has support 

X and independent marginals, one easily sees how to aggregate these different 

(k) 

results and write out a class of functions Jq as in Corollary 4.1 to get 



X~g(-;6 ) 



E 



V v (/ (y - Ha)go(y - Ho))L = 



go(X - ho) 



for all f G 7^ 



(fe) 



(4.7) 
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We conclude this section by showing how (4.6) and (4.7) read in the Gaussian 
case. Here, setting /i = E R fc and plugging the multivariate Gaussian density 
g{x; /j,, E) with E a known symmetric positive definite k x k matrix into (4.6) 
we get, for j = 1, . . . , k, 

X ~ AA(0,E) ^ E [d % (/o(2/i,X))| w=x . - ffy/oW] = for all f E J$ 

(4.8) 

where we use the notations (yj,X) = (Xi, . . . , Xj—i, yj, -Xj+i) . . . , Xk) and 
Oj := (Y,~ 1 X)j = y^j—i Moreover, when E is the identity matrix 
Ik we can use (4.7) to obtain 

X ~ Af(0,I k ) ^ E [Vy(My))\ y = x - Xfo(X)] = for all f e J^ fc) . (4.9) 

These characterizations of the multivariate Gaussian arc, to the best of our 
knowledge, new. They are to be compared with existing results given, e.g., in 
[7] and [30]. 

4-3. Scale-based characterizations 

In this subsection we apply the method described in Section 4.1 to study laws 
whose parameter of interest is a scale parameter. 

Corollary 4.2. Let k = p = 1, X = R and = Rq , and fix <tq G 9. Define 
Qsca as the collection of densities go : X — > M + with support S C X such that the 
a -parametric density g(x;a) = ago(ax) belongs to Q and satisfies Assumptions 
A and B at <tq. Let 0o C O be as in Assumption A, and define J-"q := J^oigo] Co) 
as the collection of all fo ■ X — > M such that 

Condition (cr-i) : | f x fo(x)go(x)dmx(x)\ < oo. 

Condition (a-ii) : the mapping x i— > xfo(x)go(x) is differ entiable in the sense of 
distributions over X , 

Condition (a -Hi) : there exists a mx -integrable function h : X — > M + such that 
dy(.yfo(.o-y)go(ay))\ y=x < h(x) over X for all a E O . 

Then J-q is a -characterizing for go at a^, with a -characterizing operator 

% (fo,go) :X^X:x^ Wofogfofroy)) 1=* . (4 . 10) 

a go\o-ox) 

The proof of Corollary 4.2 is similar to that of Corollary 4.1, and hence is 
omitted. 

As in the location case, this result can be extended in a number of ways to 
the multivariate setting. In the univariate setup, if g is the exponential density 
with scale parameter A and if Ao is set to 1, we retrieve the characterization 
(2.5). If go is the density of a Af(0, 1) distribution, the above characterization 
reads 

X ~ 7V(0, 1) <^> E[X&(X) + (1 - X 2 )fo(X)] = (4.11) 
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for all (differentiable) /o G J-o- 
4-4- Discrete characterizations 

Our last general result concerns discrete distributions. In this instance there is, 
in general, no unique interpretation of the parameters of interest; it depends 
on the law under investigation. As will be clear from the proof of Corollary 4.3 
below (see the Appendix), our approach in this setting allows us to dispense 
with Assumption B, which was needed in order to ensure Condition (i) in Defi- 
nition 3.1. However we need to strengthen Assumption A as follows. 

Assumption A' : for i(j(x; 9) := d u (g(x; u)/g(0; u))\ u=g , there exists a neighbor- 
hood Oo of 6*o and a summable function h : Z — > K + such that 



over X for all 9 € 9o and for all A G Bx, where Ia{]\ 9o,9q) is defined as in the 
proof of Theorem 3.1 and where A+ is the forward difference with respect to x. 

Assumption A' is sufficient to ensure Condition (iii) in the discrete setting. 
It is not restrictive and is satisfied by all the (discrete) distributions we have 
considered. For example, in the Poisson case, the ratio ip(x;9)/ip(x;9o) is none 
other than (A/Ao) x_1 In (x) so that known arguments (see page 65 of [15]) apply. 

Corollary 4.3. Let k=p = l,X = Z and 6cl, and fix ^ 6 8. Define Gdis as 
the collection of 9 -parametric discrete densities g(-',9) [0, 1] with support 

S C X , which we take of the form S = [N] := {0, . . . , N} for some N G NoU{oo} 
not depending on 9, such that g G Q and satisfies Assumption A'at9o. Define !Fq 
as the collection of all functions fo'.X^M. for which there exists a summable 
function h : Z — > R + such that \A^(fo(x)d u (g(x; u)/g(0; u))\ u= g)\ < h{x) over 
X for all 9 G Go, with Oo as in Assumption A '. 

Then J-q is 9 -characterizing for g at 9q, with 9 -characterizing operator 



Corollary 4.3 contains a number of well-known discrete characterizations cov- 
ered in the literature among which, for instance, those for the Poisson (see the 
operator in Example 3.2), the geometric Geom(p), with p-characterizing opera- 
tor 






g(x;9 ) 
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The same arguments allow, of course, for dealing with other perhaps more 
exotic discrete distributions. Consider, for the sake of illustration, the case of 
the multinomial M(n,pi, . . . ,Pk), with density 

g(x) = -^Y[ P ?I A Ax) (4.12) 

1 lj=0 X 3 ■ j=0 

where x = n - J2j=i x h Po = 1 ~ 2j=i Pj and 

A" = {(xi, . . . ,x k ) e N fc | < x x + . . . + x k < n} . 

In the same spirit as our previous multivariate characterizations, we start by 
transforming the problem into a univariate one. For this choose p± to be the 
parameter of interest, and rewrite (4.12) as 



xij J m< y\ = 



j=2 X 3 



where, letting x\ = Sj=2 x 3-> we denote ri\ = n — x\ and p\ = 1 — ^2j =2 Pj- 
Straightforward computations readily yield the corresponding operator 

%i(fo,g)(x) = £(x;ri) Mni - a;i)/ (a:i + 1) - —x 1 f (x 1 )^ I A n(x), 

with 



{Pi-PiY li+2 ' 

In each of the above cases, determining sufficient conditions on the test func- 
tions /o for the operators to be ^-characterizing is now a simple exercise which 
is left to the reader. 



5. Uncovering new results 

In this final section, we tackle two examples which do not fall within the scope 
of the previous general results. In each case, we try to convey some intuition 
as to how our method works. As will appear, each of these cases requires the 
development of ad hoc arguments. 

5.1. The uniform distribution 

First take the target distribution g to be the density of a uniform U[a, b] for 
a < b € R, and define a to be the parameter of interest. This law is not, 
stricto sensu, a member of the scale family. It is, however, easily seen that 
it belongs to Q for all a ^ b and satisfies Assumptions A and B at all a < 
b, with b fixed. It is readily seen that the exchanging operator T(fo;a)(x) = 
(x — b)/(b — a)fo ((x — a)/(b — a)) yields the precious relationship (4.1), with 
T{fo',a) = fo((x — a)/(b — a)). This leads to the following result (the proof is 
left to the reader). 
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Corollary 5.1. Let J-"o be the collection of all functions fo : R — > R which are 
differ entiable (in the sense of distributions) on [0, 1]. Then To is a- characterizing 
for g, with a- characterizing operator 

w ,,)w - j±- a (frijs + A (|^)) - Am 

/or /o e T . 

Similarly, one can also construct a 6-charactcrizing operator and a 6-charac- 
tcrization for the uniform law on [a, b]. A third way to characterize this law is 
to proceed as in Section 4.2 and construct a ^-characterization, for [i a location 
parameter introduced by considering the density g(x — fi) and working, through 
Corollary 4.1, with respect to a. This yields the expression in Example 2.1. 

5.2. The Student distribution 

Take the target distribution g to be the density of a Student T(y) with parameter 
of interest v £ Rg", the tail weight parameter. This law belongs to Q for all v > 
and satisfies Assumption A at all v > 0. It is readily seen that the exchanging 
operator 

T(/o; ./)(*) = -a; r(( „ + 1)/2) * (1 + v ) f °{-) 

yields the precious relationship (4.1), with 

Sufficient conditions on /o for the now usual requirements to be fulfilled are 
easily imposed. This leads to the following result. 

Corollary 5.2. Fix v > 2. Let To be the collection of differ entiable (in the sense 
of distributions) functions fo '■ R — > R smc/i t/ia£ \fo(x 2 )\/y/l + a; 2 awd |x/q(2; 2 )| 
are m^-integrable. Then Tq is v- characterizing for g, with v- characterizing op- 
erator 

%(f ,g)(x) = Z(x;u) (^x 2 f Q {^j f 

where f(a:;i/) = -r(i//2)(2i/ 2 r((i/ + 1V2))- 1 (l + a; 2 /!/)^. 

The proof of this result is mainly computational and follows along the same 
lines as that of all other similar results provided in this paper. 

It seems appropriate to conclude on this final example. Obviously, simi- 
lar parameter-based characterizations can be obtained, by means of the same 
tools, for gamma, hypcrgcomctric, Laplace, Pareto distributions, etc. As far as 
we know there exists no univariate characterization which cannot be obtained 
through our approach. 
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6. Applications 

In all works related with Stein's method the characterization is merely the first 
step in a complicated and not a little mysterious process. In this paper we do not 
discuss the intricacies and subtleties of the method, and rather refer the non- 
initiated reader to the monographs [2, 3] or [11] for an overview. Moreover our 
parametric approach to the characterizations has to this date never been used 
for any application. The purpose of this section is to provide two simple and 
direct consequences of our vision. Deeper results are still under investigation. 



6.1. Solving Stein equations 

Suppose that, for a given parametric target distribution g, we dispose of charac- 
terizations of the form Z ~ g(-; 9q) •«=>■ E[7g (/, g){Z)] = for all / in F{g\ 6* ), 
where Tg (f,g) is a Stein operators. Then a Stein equation for g at 9$ is a 
differential equation given by 

Te (h,9)(x) = l(x) (6.1) 

for I : X — > R some function. 

Example 6.1. In the Gaussian case, we obtain the location equation 

f'(x) - xf(x) = l(x) 

and the scale equation 

xf'(x) + (l-x 2 )f(x)=l(x). 
In the Exponential case we obtain the location equation 

(f'(x)-f(x))I M+ (x) = l(x) 

and the scale equation 

(xf(x)-(x-l)f(x))l M+ (x) = l(x). 
In the Poisson case we obtain the X-equation 

Mx + l)-^f (x))Mx) = l(x). 
-^0 / 

A careful reading of the different proofs provided in this paper shows that 
Theorem 3.1 not only yields Stein operators, but also solutions to the corre- 
sponding Stein equations (see equations (3.3), (A.l) and (A. 4)). More specifi- 
cally, our way of writing the operator (as a single differential) obviously allows 
for solving all such equations in a unified way by simple integration. Note in 
particular how, in the discrete case, the solution is obtained through straight- 
forward summation. In other words our approach allows for solving all Stein 
equations in a routine fashion. 
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6.2. Stein's method and information theory 

Although there are many consequences to our Theorem 3.1, perhaps the most 
intuitive is that it provides a hitherto unsuspected direct link between Stein's 
method and information theoretic tools. Such results are, however, outside the 
scope and purpose of the present work and will be the subject of separate 
publications. We nevertheless wish to suggest the flavor of this connection, and 
therefore conclude the paper with a particularly appealing result. 

Choose two parametric densities p,q € S sharing the same support Sg . Take 
/ € J-(p; 9 ). We obviously have 

rr ft \ / \ d e f{x-e)p{x-e)\ g=f)n 
Te {f,P){x) = - 



p(x;0q) 

def{x;B)q(x;0)\e=e o p{x;0o) , f (x; 9 )g(x ] 9 ) . fp(x;9) 

1 ; 7TS Of 



q{x;9 ) p(x;9 ) p(x;9 ) \q(x;6) i 

Straightforward simplifications then yield our final lemma. 
Lemma 6.1 (Factorization of Stein operators). For all f G F(p\Qq), we have 

Te {f,p){x) = Te (f,Q)(x) + f(x;9 )re Q {p,q)(x), (6.2) 

with 

dep(x;0)\ e=e deq(x;9)\ e=g 

re (p,q)(x) := -. — — — -. (6.3) 

p{x;9 ) q(x;0a) 

We call the operator r§ a generalized (standardized) score function because 
specifying the role of 9 (location, scale, ...) as well as its nature (discrete, con- 
tinuous) allows to recover a whole family of score functions discussed in [21], 
[23] or [5] . Such an observation obviously has an intriguing number of immedi- 
ate applications, but also opens new lines of research which are currently under 
investigation. Sec [25] for first results in this direction. 

Appendix A: Technical proofs 

Proof of equality (3.4). First note that 
dt (fA(x;t)g(x]t))\ t=g 

= d t { l A {x;u,t)g(x;u)dme(u) 
\Je 



= l A (x;9,9)g(x;9) + / d t (lA(x;u,t))\ t=g g(x;u)dme{u). 

J Bo 

Now we have 

dt (lA(x;u,t))\ t=g = dt (Is t (x))\t=e &a(x) - P(Z U e A\ Z u e S e )) 
- d t {P(Z U eA\Z u e St))\ t=g Is 6 (x). 
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On the one hand, we easily see that the function 
H x {x;9) := 

9tMx))\t=o I {U(x)-P(Z u £A\Z u eSe))g{x;u)dm @ (u) 



is well-defined, bounded uniformly in 9 over Oo by a m^-integrable function 
and satisfies Hi(x; 9q) = 0. On the other hand, we have 

d t (P(z u e A\ z u g s t ))\ t=e 

at(P(Z u eAnS t ))\ t=e fv(7 aqw P(z u eAnS e ) 
~ P(Z U i So) * (P(4e5t,)lt = e P{Z u tS y ' 

where clearly both derivatives are well-defined. Hence the function 



H 2 {x;9) :=I Se (x) / d t (P(Z u eA\Z u G S t ))\ t=e g{x; u)dm e {u) 

is also well-defined, bounded uniformly in 9 over Oo by a m^-integrable function 
and satisfies Hi(x\ #o) — 0. Defining 

H(x; 9) :=H 1 (x;9)-H 2 (x;9) 

we see that all the assertions in the proof of Theorem 3.1 hold, and, moreover, 
that 

d t {fA{x;t)g(x;t))\ t=9a = l A (x;9 ,9 )g(x;9 ) + H(x;9 ) 
= l A (x;9 ,9 )g(x;9 ). 

This completes the proof of Theorem 3.1. □ 

Proof of Corollary 4.1 (location). We apply the method described in Section 
4.1. 

Step 1: Choose f(f ;n)(x) = f {x - n). 
Step 2: Set T(/ ; fj,)(x) = -f (x - fx). 

Step 3: One easily sees that, for any /o € Tq, Conditions (/z-i)-(/x-iii) on /o entail 
that Conditions (i)-(iii) are satisfied by T(f ; n)(x). 

Step 4 : Consider the solution of the Stein equation given by 



fo( x - ^0) = - - ,^ - \ ( / lA{y;no,no)go(y-no)dm x (y) + c(x) 

I x 



go(x- mo) 



for some io £ A 1 , where the function x i-> c(x) has derivative (in the sense of dis- 
tributions) equal to zero and is defined in such a way that (f* Ia(v', Mo, Ho)go(y— 
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Ho)dmx{y) + c(x))d x ls{x — Mo) — over X. This function can be expressed as 
a sum of Dirac delta functions whose vertices are determined by d x Is{x — fj,o). 
This yields the candidate solution 



1 



fo( x ) = rr / lA(y,fi>o,fi>o)go{y- (J-o)dm x (y) + c(x + no)) . (A.l) 



■'•(1 



For this function to belong to J-"o, we need Condition (/U-ii), which is obvious, 
Condition (/x-iii), which is also obvious thanks to Assumption A once again, and 
Condition (/i-i) which will hold as soon as 



A 

Ji 

x 



fo (x)g (x)dmx(x) 

X + Uo 

c 



Ia(v, l*>, m>)g(v - ^o)dmx{y)ls{x)dmx{x) 

' X J x 



< oo, 



where C — J x c(x + /j,o)fs(x)dmx{x) is finite. Since Assumption B then ensures 
that the quantity \J X f A (x)go(x)dmx(x) \ is bounded, Condition (/z-i) is satisfied 
as well, which concludes the proof. □ 

Proof of Corollary 4-3 (discrete). In this framework, the exchangeability condi- 
tion (4.1) reads 

d e (f(fo;0)(x)g(x;O))\ g=eo = A+ (T(f ;6 )(x)g(x;6 )) , (A.2) 

for some fo € JFq. In order to obtain the announced ^-characterizing operator 
7e (fo,g), we define 

T{fo;V)[x) = . , , . (A.3) 

g{x;9)g(0;6) 

and the (invertible) exchanging operator 

de(g(x;0)/g(O;e))\ g =e o 



T(f 0] e )(x) = f (x)- 



g{x;Oo) 



One readily checks that these choices satisfy the exchangeability condition (A.2). 

Fix So G 6- The sufficient condition is immediate. For the necessary condition 
to hold, we solve 

A+ (T(/ A ; 6 )(x)g(x; 9 )) = l A (x; ,6 )g(x; 6 ), 

with I a as before, to obtain the candidate solution 

x-l 

tf{x) = {^(x-Mr^^OoMg^rM, (a.4) 

3=0 

where the sum over an empty set is 0. Assumption A' guarantees that this 
function belongs to Fq. □ 
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